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ABSTRACT 

This work is dedicated to the development of a 
technology for unbiased, high-througtiput DNA 
methylation profiling of large genomic regions. In 
this method, unmethylated and methylated DNA 
fractions are enriched using a series of treatments 
with methylation sensitive restriction enzymes, and 
interrogated on microarrays. We have investigated 
various aspects of the technology including its rep- 
licability, informativeness, sensitivity and optimal 
PCR conditions using microarrays containing oligo- 
nucleotides representing 100 kb of genomic DNA 
derived from the chromosome 22 COMT region in 
addition to 12192 element CpG island microarrays. 
Several new aspects of methylation profiling are 
provided, including the parallel identification of 
confounding effects of DNA sequence variation, 
the description of the principles of microarray 
design for epigenomic studies and the optimal 
choice of methylation sensitive restriction enzymes. 
We also demonstrate the advantages of using the 
unmethylated DNA fraction versus the methylated 
one, which substantially improve the chances of 
detecting DNA methylation differences. We applied 
this methodology for fine-mapping of methylation 
patterns of chromosomes 21 and 22 in eight indi- 
viduals using tiling microarrays consisting of over 
340 000 oligonucleotide probe pairs. The principles 
developed in this work will help to make epigenetic 
profiling of the entire human genome a routine 
procedure. 



INTRODUCTION 

Over the last decade the field of DNA methylation has grown 
dramatically and become one of the most dynamic and rapidly 
developing branches of molecular biology. The methyl group 
at the fifth posilion of the cytosine pyrimidine ring, that is 
present in about 80% of CpG-dinucleotides in the human 
genome, can be of major functional significance and is 
regarded as the 'fifth base' of the genome (1). DNA methyla- 
tion, along with histone modifications (acetylation, methyla- 
tion, phosphorylation and the like), are referred to as 
epigenetic phenomena that control various genomic functions 
without a change in nucleotide sequence (2). Such functions 
include meiotic and mitotic recombination, replication, control 
of 'parasitic' DNA elements, establishing and maintenance of 
gene expression profiles, X chromosome inactivation as well as 
regulation of developmental programming and cell differenti- 
ation (3-6). Aberrations in epigenetic regulation, or 'epimuta- 
tions', cause several paediatric syndromes (Prader-Willi 
[OMIM #176270], Angelman [OMIM #105830], Beckwith- 
Wiedemann [OMIM #130650] and Rett [OMIM #312750]) 
(7) and may also predispose to cancer (8). 

Our understanding of the peculiarities of DNA methylation 
in the human genome is still very superficial. Based on the 
review of available publications, our estimate is that <0. 1 % of 
the genome has been subjected to a detailed DNA modification 
analysis. The recently completed Human Genome sequencing 
project did not attempt to differentiate between methylated and 
unmethylated cytosines. To some extent our understanding of 
the dynamic state of genome-wide DNA methylation has been 
hampered by the lack of high-throughput technologies that 
would interrogate DNA metliylation profiles over laige 
genomic regions. A gold standard technique in DNA methyla- 
tion studies, the bisulfite modification-based fine mapping of 
meiQ ^gj^ although precise, is veiy labour intensive and in 
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most cases limited to short DNA fragments, often less than a 

kilobase. 

The advent of microarray technologies that enabled the inter- 
rogation of a large number of DNA/RNA fragments in a highly 
parallel fashion has opened new opportunities for epigenetic 
studies (10). A number of microarray-based technologies used 
for epigenetic analyses are already available ( 1 1-23). However, 
all of these methods have some limitations, which renders them 
unsuitable for some experimental setups. Additionally, many 
technological parameters, such as the influence of DNA 
sequence variations, amplification conditions and sensitivily 
of the methods have not been investigated before. Here we 
present a detailed analysis of various parameters of epigenetic 
profiling and provide a substantially improved microairay- 
based high-throughput technology for DNA methylation 
profiling of DNA regions that span from hundreds of kilobascs 
to megabases. Eventually, this technology will be applied to 
the entire human genome, as exemplified by the methylation 
mapping of chromosomes 21 and 22 as reported here. 



MATERIALS AND METHODS 

Microarray fabrication and data processing 

COMT and CpG island microarrays were printed on Corning 
CMT-GAPSn slides (Coming Life Sciences, Acton, MA) 
using a VersArray ChipWriter Pro System (Bio-Rad Labor- 
atories, Hercules, CA). For the COMT airay, we designed 384 
oligonucleotides (Operon/Qiagen, US), each 50 bases long, 
representing every restriction fragment flanked by Hpall, 
Hin6I and Acil restriction sites. In addition, control DNA 
fragments containing A, phage, pBR322, 4>X174 and pUC57 
sequences were spotted on the slide. Each oligonucleotide was 
diluted to a 25 |jM solution and spotted four times to give a 
total of 1536 elements. In addition, 192 blank spots consisted 
of SSC buffer and 48 spots contained Arabidopsis clones. The 
human CpG island airay contains 12 192 sequenced CpG 
island clones derived from a CpG island library that was ori- 
ginally created with MeCP2 DNA binding columns (24,25). 

Hybridized arrays were scanned on a GenePix 4000A scan- 
ner (Axon Instruments, Union City/CA) and analysed using 
the GenePix 6.0 software. The GenePix PMT voltage for Cy3 
and Cy5 channels were balanced with the histogram feature of 
the scanner software to ensure a similar dynamic range for the 
two channels. Final scans were taken at 10 |J,m resolution, and 
images for each channel were saved as separate 16-bit TIFF 
files. The emission signals for each channel were determined 
by subtracting the local background from its corresponding 
median average intensity. These raw data were either exported 
into a custom Excel spreadsheet for subsequent data analysis 
or directly imported into the Acuity 4.0 software (Axon 
Instruments). The resulting datasets were normalized for the 
normalization features (spike-DNAs) and for signal intensity 
(Lowess normalization). 

Profiling of unmethylated sites in the brain tissue of eight 
adults was carried out using a tiling aixay spanning ~12 Mb of 
non-repetitive sequence of chromosome 21 and 22 (q aims), 
with probes spaced on average every 35 bp center-to-center 
(26). The genomic DNA from these individuals was cut witli 
Hpall and Hin6I, amplified and hybridized to the microaixay 
as described previously (26,27). Unprocessed total genomic 



DNA from the same brain region (prefrontal cortex) was used 
as a control. Unmethylated sites were defined using a two-step 
analysis approach similar to the one used to determine tran- 
scription factor binding sites in the chromatin immunoprecip- 
itation (ChlP)-chip assay (27). First, a smoothing-window 
Wilcoxon approach was applied to generate a F-value 
graph for each individual where probe signal from the enriched 
fraction was compared with the total genomic DNA in a one- 
sided upper paired test. The window used in this report was 
501 bp. Second, three thresholds were applied to determine the 
boundaries of the unmethylated site: (i) an individual probe 
threshold of F < lO"'* to detennine if a probe is significantly 
enriched in the unmethylated fraction compared with the 
control total genomic DNA; (ii) the maximum distance 
between the two positive probes set to 250 bp and (iii) 
the minimal size of a site set to 1 bp. The graphs can be 
downloaded from the internet (see Web resources). All 
coordinates and annotation analysis were done on the April 
2003 version of the genome. 

Methylation -sensitive digestion of genomic 
DNA (gDNA) 

Prior to treatment with restriction enzymes, gDNA was sup- 
plemented with 'spike' -DNAs (different concentrations of X 
and Arabidopsis fragments), which were used as controls for 
signal nomialization. For enrichment of the unmethylated 
fraction, depending on the number of CpG dinucleotides to 
be interrogated, several combinations of methylation-sensitive 
enzymes, HpaQ, Hin6I, Acil and HpyCH4IV, were used. 
gDNA was cleaved with a cocktail of these enzymes (10 U/ 
|il in 2xY+/Tango buffer, Fermentas Ufe Sciences/Lithuania) 
for 8 h at 37°C. For enrichment oiF the methylated fraction, 
gDNA was cleaved by TasI or Csp6I (10 U/|xl in G'*"-buffer, 
Fermentas) for 8 h at 65°C (TasI) or at 37°C (Csp6I). After the 
restriction reaction, TasI was inactivated by 0.5 M EDTA. 

Adaptor-ligation 

For the ligation step, gDNA was supplemented with 8 GE 
Mspl-cleaved pBR322 plasmid (1 GE = 1.45 pg/ 1 ng 
gDNA), which was used as control for a potential ligation 

bias. The ends of the cleaved DNA fragments were ligated 
to the unphosphorylated adaptors. Our adaptors contained a 
sequence-specific protruding end, a non-target homologous 
core sequence, a specific antisense-overhang that prevents 
tandem repeat formation and blunt-end ligation, a 'disnaptor' 
sequence that interrupts the original restriction sites after liga- 
tion, a new non-palindromic Alw26I (BsraAI) restriction site 
that enables the blunt-end cleavage of the adaptor from the 
tai'get sequences (e.g. for library enrichment) and a non-5'- 
coraplementary end. The CpG-overhang specific universal 
adaptor 'U-CGl' for the unmethylated DNA fraction ligates 
to DNA fragments generated by 11 CpG-methylation- 
sensitive restriction enzymes HpaU, Hin6I (HinplI), Hpy- 
CH4IV, Bsul5I (Clal, BspDI), Acil (Ssil), Pspl406I (AclI), 
Bspll9I (AsuII), Hinll (Acyl, Bsaffl), Xmil (Accl), Narl, 
BslBI (FspII) and also TaqI and Mspl, which are not affected 
by methylation of the internal cytosine. The adaptor represents 
the annealing product of the two primers U-CGla, 
5'-CGTGGAGACTGACTACCAGAT-3', and U-CGlb, 
5'-AGTTACATCTGGTAGTCAGTCTCCA-3'. 
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The AATT-overhang specific adaptor 'AATT-l' for the 
methylated DNA fraction fits to DNA ends produced by the 
restriction enzyme TasI (TspEI), whereas the TA-l' adaptor 
fits to ends produced by Csp6I, Bfal or Msel, respectively: 

AATT-la, 5'-AATTGAGACTGACTACCAGAT-3'; AAT- 
T-l b, 5'-AGTTACATCTGGTAGTCAGTCTC-3'; TA-l a, 
5'-TATGAGACTGACTACCAGAT-3'; and TA-lb: 5'-AGT- 
TACATCTGGTAGTCAGTCTCA-3'. 

All adapters were prepared by mixing equimolar amounts of 
the primer pairs, incubating the mixture at 80°C for 5 min, and 
then cooling it down to 4°C with l°C/min. The double- 
stranded adaptors [200 pmol/(xl] were added at 0.1 pmol 
per enzyme for each ng of the cleaved DNA (e.g. 0.3 pmol/ 
ng in a triple-digest HpaII/Hin6I/AciI). The ligation-mixture 
with 400 ng template DNA was supplemented with 2|il p^^^^ 
ligation buffer (Fermentas), 1 p,l ATP [10 mM] and water to 
18 |J.]. The reaction was started in a thermal-cycler at 45°C for 
10 min, chilled on ice and 2 |J.l T4 ligase (Femientas) was 
added. The ligation reaction was carried out at 22°C for 18 h, 
followed by a heat-inactivation step at 65''C for 5 min. The 
mixture was then cooled down to room temperature with I'C/ 
min and stored at 4''C for subsequent procedures. 

PGR 

To control for a potential PGR bias, the DNA mixture was 
supplemented with 2 GE OX174 plasmid (1 GE = 1.8 pg of 
4>X174 corresponding to 1 }ig gDNA) that was cut with - 
HpyCH4IV and ligated to the adaptor. PGR amplifications 
were conducted for up to 25 cycles. A standard amlnoallyl- 
PCR mixture included 400 ng of the ligate, 40 nl of lOx 
reaction-buffer (Sigma), 42 (J,l MgClz [25 mM], 3 (xl 
aminoallyl-dNTP Mix [containing 15 mM aminoallyl- 
dUTP, 10 mM dTTP and 25 mM each dCTP, dGTP and 
dATP], 200 pmol primer (U-CGla, AATT-lb or TA-lb, 
respectively), 3 |al Tag enzyme (5 U/|i], NEB) and water to 
a final volume of 400 p.1. For PGR conditions and generation of 
dye-coupled adaptor products see Supplementary Data. 

Array hybridizations 

Each microarray slide was prehybridized with a mixture con- 
sisting of DIG Easy Hyb (Roche Diagnostics), 25 |Xg/ml tRNA 
and 200 |J.g/ml BSA. The printed area was covered with the 
prehybridization mixture under a coverslip for 1 h at 45°C. The 
microarray slides were then washed in two changes of water 
for 2 min at 45°C, followed by two wash-steps at room tem- 
perature and a final wash-step in isopropanol for 1 min. The 
slides were immediately blown dry with pressurized air and 
stored for hybridization. The hybridization mixtures were then 
pipetted onto the arrays and covered with Sigma Hybri-slips. 
The microarrays were placed in hybridization chambers 
(Coming Microarray Technologies, NY) and incubated on a 
level surface for 16 h at 42°C for the COMT-arrays and 
44-52°C for the CpG island microarrays in a covered water 
bath. The coverslips were removed by immersion of the airays 
in a wash solution containing 2x SSC and 0.5% SDS (washing 
buffer I). The array was washed twice for 15 min at 42-52°C in 
washing buffer I (low stringency), followed by two wash-steps 
in washing buffer H (0.5x SSC, 0.5% SDS), followed by 2 min 
of incubation in water. The slides were then rinsed quickly 
in isopropanol and finally dried with pressurized air. 



The hybridization method used for the chromosome 21 and 
22 tiling anays was described before (26,27). 

Whole genome amplification 

Genomic DNA was amplified using the GenomiPhi Kit 
(Amersham Biosciences) according to the manufacturer's pro- 
tocol. Briefly, 10 ng of gDNA (1 |xl) was mixed with 9 |J.l of 
sample buffer, denatured at 95°C for 3 min, cooled on ice and 
then added to 9 |il of reaction buffer and 1 |il of Phi29 DNA 
polymerase. The reaction was incubated at 30°C for 16 h and 
then inactivated at 65°C for 10 min. 

Bisulfite sequencing 

The methylation status of a number of CpG islands were 
analysed by direct sequencing of sodium bisulphite modified 
gDNA (9). gDNA samples were subjected to bisulfite modi- 
fication using a standard protocol (28). The primer sequences, 
PGR conditions and cloning metliods are provided in the 
Supplementary Data. 

Genomic DNA 

Genomic DNA from all tissues was purified using standard 
laboratory methods (Phenol-Chloroform or Qiagen Blood and 
Cell DNA Midi columns). To avoid cross reactivity of amine 
groups with the aminoallyl-labeling procedure, DNA samples 
were stored in 0.5 M POPSO buffer (pH 8.0) instead of 
Tris-EDTA. Male placental DNA was purchased from 
Sigma and the post mortem brain samples were provided 
by the Stanley Medical Research Institute. All parts of the 
study were approved by the CAMH review/ethics board. 

Web resources 

All chromosome 21/22 tiling array data can be viewed in the 
UCSC genome browser available via the methylation database 
at www.epigenomics.ca. Additionally, the complete tiling 
aiTay source data plus graphs that can be viewed in the Integ- 
rated Genome Browser (Affymetrix; www.affymetrix. 
com/support/developer/downloads/TilingAnayTools/index.affx) 
and can be downloaded at http://tianscriptome.affymetiix.com/ 
download/DataMethPaper (case sensitive). All coordinates and 
annotation analysis was done on the April 2003 version of the 
genome. SNP data were derived from the SNP consortium, www. 
ncbi.nl ra .nih.gov/SNP. 

OMIM numbers aie derived from Online Mendelian Inheritance 
in Man (OMIM), http://www.ncbi.nlm.nih.gov/entrez/quei-y. 
fcgi?db=OMIM. Genome annotations were derived from the 
ReSeq database, http://www.ncbi.nlm.nih.gov/ReSeq/ and the 
UCSC database, http://genome.ucsc.edu/cgi-bin/hgGateway. 



RESULTS 

Enrichment of the unmetbylated fraction of gDNA 

The strategy for enrichment of unmetbylated portions of the 

genome is presented in Figure 1. gDNA is digested witli 
methylation-sensitive restriction enzymes (Figure 1, middle 
panel). Whereas methylated restriction sites remain 
unaltered, the sites containing unmetbylated CpGs are 
cleaved by the enzymes, and DNA fragments with 5'-CpG 
protruding ends are generated. The proportion of intenogated 
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Figure 1. Schematic outline of the microarray-based method for identification of DNA melhylation differences and DNA polymorpliisms in genomic DNA. Left 
panel: analysis of DNA sequence variation. Middle panel: the main strategy of the method is based on enrichment of un methylated DNA fragments. DNA samples are 
cleaved by mctliylation-sensitive restriclion endonucleases, and the resulting DNA fragments are tlien selectively enriched by adaptor-specific aminoallyl-PCR's, 
labelled and hybridized to microarrays. Right panel: alternative procedure to enrich the hypermetliylated DNA fi'action. 



CpG sites depends on the methylation-sensitive restriction 
enzymes used for the restriction of DNA. Based on our 
analysis of the CpG dinucleotides within the sites of 
methylation-sensitive restriction enzymes across several 

megabases of human gDNA, the combination of three 
enzymes, Hpall, Hin6I and Acil, should inteiTogate ~32% 
of all CpG dinucleotides in mammalian DNA (Table 1). The 
addition of two other relatively inexpensive methylation- 
sensitive CpG-overhang generating enzymes, HpyCH4rV 
and Hinll, would theoretically increase the proportion of 
interrogated CpGs to ~41%. Depending on the 
microarray-type, in our experiments we usually use either 
a single enzyme or a 'cocktail' of up to three restriction 
enzymes. The application of a set of enzymes might be 
disadvantageous for the analysis of GC-rich regions as 
such a strategy would produce restriction fragments too 
short for an efficient hybridization. In the latter case, it is 
advisable to use a smaller number of restriction enzymes. 
Based on our experimental results and computer-based 
analysis of 100 randomly selected CpG islands, the most 
suitable restriction enzymes are Hin6I and Hpall, followed 
by Acil and Hinll (Table 1). In contrast, for regular DNA 



sequences, double- or triple-digest combinations of Acil, 
Hpall, HpyCH4IV and Hin6I are recommended. 

After the digestion of gDNA, the double-stranded adaptor 
U-CGl is ligated to the CpG-overhangs. At this point, it is 
expected that most of the relatively short (<1.5 kb) and amp- 
lifiable DNA fragments derive from the unmethylated DNA 
regions. To some extent, the length of the amplified fragments 
depends on the primer annealing temperature of the PCR 
reaction (Figure 2A). Some ligation firagments, however, 
may still contain methylated cytosines. A proportion of 
such fragments can be eliminated by treatment with 
McrBC, which cleaves DNA containing ""^'C and will not 
act upon unmethylated DNA. McrBC restriction sites consist 
of two half-sites of the form (G/A)"'^'C, which can be separ- 
ated by up to 3 kb (29,30). Hence, as can be seen in Figure 2B, 
a proportion of DNA fragments with two or more (G/A)"""C 
within the restriction fragment are cleaved and flierefore 
deleted from the subsequent enrichment steps. The remaining 
pool of unmethylated DNA fragments is then enriched by 
arainoallyl-PCR amplification that uses primers complement- 
ary to the adaptor U-CGl. One important advantage of using 
protruding ends in the adaptor-ligation step is that degraded 
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Table 1. Enzymes that generate protruding ends in the restriction fragments, 
which are complementary to the adaptors U-CGl, TA-1 and AATT-1 



Enzymes 


Recognition 


Percentage 


Number of 


Number of 




sequence 


coverage 


fragments 


fragments 






of CpGs in 


(per kb) in 


(per kb) in 






human 


CpG islands* 


non-CpG 






gDNA (%) 




islands* 


Hpall (BsiSI) 


CCGG 


8.6 


3.98 


1.18 


Hanoi (HjnPlI) 


GCGC 


6,4 


3.98 


0.61 


Acil (Ssil) 


CCGC 


17.4 


3.23 


1.79 


Hlnll 


GPuCGPyC 


2.0 


1.92 


0.11 


(Acyl, BsaHI) 










HpyCH4IV 


ACGT 


6.6 


1.31 


1.08 


BsulSI 


ATCGAT 


0.2 


<0.01 


0.02 


(Clal, BspDI) 










Narl (Mlyl) 


GGCGCC 


0.6 


J. 08 


<0.01 


Bspligi 


TTCGAA 


0.1 


o.n 


<0.01 


(BstBI. Asull) 










Pspl406I 


AACGTT 


0.3 


<0.01 


0.05 


(Acll, Pspl) 










Xmil (AccI) 


GmKAC 


0.1 


0.19 


0.34 


TasI 


AATT 


na 


0.80 


2.88 


Csp6I 


GTAC 


na 


2.23 


1.41 


Msel 


TTAA 


na 


0.80 


2.88 


BM 


CTAG 


na 


1.56 


1.55 



Asterisk (*) indicates the number of 50 bp to 1.5 kb long ('informative') frag- 
ments, derived Irom several Mb of randomly selected CpG island and non-CpG 
island sequences on cluomosomes 1 . 2, 4. 5, 6, 9, 17, 19 and 20; bold numbers 
representthemostinformativeenzymes;na = not applicable; M = Adenlneor 
Cytosine; K = Guanine or Thymine. 



gDNA fragments (which are common in human post mortem 
tissues) will not be ligated and amplified, and therefore will not 
interfere with DNA methylation analysis. 

Most previous microarray-based epigenetic studies target 
aypermethylated DNA sequences (15,17,31,32); however, 
interrogation of the unmethylated fraction is significantly 
more informative. For example, the 100 kb region of chromo- 
some 22 interrogated by our COMT oligonucleotide 
array (TXNRD2-C0MT-ARVCF region; Microarray Design), 
contains 2193 metliylatable cytosines. Enrichment of the 
unmethylated fraction can generate up to 401 amplicons of 
sufficient size (50-1.5 kb), each representing the methylation 
status of at least one cytosine. In cond'ast, the combination of 
Msel (+BsuI, to remove unmethylated fragments), the most 
frequently used enzymes for enrichment of the hypermethyl- 
ated fraction (15,17,31,32), would produce 227 amplicons. 
Seventy-seven amplicons would either contain no CpG dinuc- 
leotides or would be too .short to stringently hybridize to a 
microarray. Of the remaining 150 fragments, 144 contain 
multiple CpGs; hence, they are not fully informative since a 
single unmethylated BsuT restriction site would eliminate the 
entire fragment from the eventual amplification. Overall, only 
5 of the 2193 methylatable cytosines are truly informative, 
and none of these CpG dinucleotides are targeted by Bsul. 
Computer-based analysis of 50 randomly selected CpG island 
sequences revealed that the unmethylated fraction derived 
from Hpall cleavage results in ~22 times more fragments 
(19.9 fragments/kb) of the suitable size range (50 bp to 
1.5 kb) than the hypermethylated fraction (0.9 fragments/kb) 
using Msel. 

Nevertheless, analysis of the iiypermethylated DNA frac- 
tion may also add some new information to the methylation 



profiles, especially in the case of hypermethylated CpG islands 
or when the overall level of methylation in the genoine is low 
(e.g. in insects). Thus, we developed an additional, modified 
method to previously published methods of enrichment of 
methylated sequences to complement our data from the 
unmethylated fraction (Figure 1 , right panel). This enrichment 
method relies on cleavage with the 4 bp frequent cutters TasI 
(AATT.I) and/or Csp6I (GJ.TAC). Alternatively, Bfal or Msel 
can be used in combination with the Csp6I-specific adaptor. 
All four enzymes produce DNA fragments in mammalian 
genomes of an average length 400-750 bp. The recognition 
sequences of TasI and Csp61 ai'e infrequent within GC-rich 
regions, leaving most CpG-islands intact. The analysis of 50 
randorrdy selected CpG islands and several megabases of dif- 
ferent chromosomes revealed that Csp6I would produce more 
informative fragments in CpG islands than a digest with Msel, 
whereas TasI and Msel produce informative fragments pref- 
erentially in DNA regions outside of CpG islands 
(Table 1). After ligation to the AATT- and TA-overhang spe- 
cific adaptors 'AATT-l' and 'TA-l', the un- and hypo- 
methylated ligation products are eliminated from the reaction 
by cleavage with a cocktail of methylation-sensid ve restriction 
enzymes such as Hpall, Hhal (Hin6I), HpyCH4IV, Hinll and 
Acil. Compared with a single digestion with BstUI (17), a 
cocktail of restriction enzymes will delete a higher percentage 
of unmethylated sequences from the DNA fraction. The 
remaining pool of mostly hypermethylated DNA fragments 
is subsequently enriched by the aminoallyl-PCR amplification 
as described for the unmethylated fraction, and then hybrid- 
ized to a microanay (Figure 2C). 

Microarray design 

Various aspects of the microarray-based DNA modification 

profiling were investigated on the oligonucleotide microaiTay 
that interrogates ~100 kb fragment on 22qll.2 (Figure 3A). In 
addition to the catechol-C-methyltransferase (COMT, [MDVI 
116790]), this chromosomal region contains also the gene 
encoding the thioredoxin reductase 3 gene {TXNRD2, [MIM 
606448]) and the armadillo repeat gene deleted in velocardi- 
ofacial syndrome (ARVCF, [MIM 602269]). For maximal 
inforraativeness, it is necessary to design oligonucleotides 
according to the restriction sites of the methylation sensitive 
endonucleases used for the treatment of gDNA (Figirre 3B). 
For the COMT array, 384 oligonucleotides were designed, 
each 50 nucleotides long, representing every restriction firag- 
ment flanked by Hpall, Hin6I and Acil restriction sites. In 
addition, control DNA fragments containing X phage, 
pBR322, 4>X174, pUC57 and Arabidopsis sequences were 
spotted on the array (Materials and Methods). Additionally, 
we used 12 192 element containing CpG island- and 
high-density chromosome 2l/22-microarrays (Materials and 
Methods). 

Detection of confounding effects of DNA sequence 
variation 

Smce restriction enzymes are used in the enrichment of dif- 
ferentially modified DNA fractions, DNA sequence variation 
may simulate epigenetic differences. However, until now, 
microarray methods used in epigenetic studies have not been 



Nucleic Acids Researcli. 2006, Vol. 34, No. 2 533 




10*3 10*4 10*6 10*6 
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Figure 2. Selective enrichment of restriction fragments with the universal adaptor U-CG I . (A) Scatterplot that shows a comparison of ligation products treated with 
McrBC versus the untreated sample on the COMT array. McrBC treated fragments that contained at least two metliylated cylosines were cleaved and could not be 
amplified in the following adaptor-PCR, resulting in reduced signal intensities in the CyS channel. (B) Co-hybridization of enriched unniethylated (Figure 1, middle 
panel) and hypermethy laled (Figure 1 . right panel) fragments derived from the .same DNA source to a CpG island microarray. A large portion of amplicons is present 
only in one of the enriched fractions (marked black for log >0.3 black, green for log <— 0.3). Although the hypermethylated fraction hybridized to ~75% of the 
microarray spots, based on our DNA sequence analysi-s, only a small fraction of them provide epigenetic information in comparison with the unmethylated fraction. 



differentiating between real DNA methylation differences 

and single nucleotide polymoiphisms (SNPs) within the 
restriction sites of the applied restriction enzymes. This prob- 
lem applies to some extent also to the "'°'C antibody-based 
strategy (22), which does not differentiate unmethylated CpG 
and TpG dinucleotides. In order to exclude the impact of DNA 
sequence variation, two approaches are suggested. One is to 
check the available SNP databases in order to identify the 
DNA sequence variation within the restriction sites of the 
enzymes used. For example, our 100 kb C0M7' array contains 
a total of 273 SNPs (SNPper, http://snpper.chip.oig/bio/ 
snpper-enter), of which 101 (37%) reside within CpG dinuc- 
leotides and 55 (20%) are located within the restriction site of 
the four main enzymes used to interrogate methylation 



patterns, Hpall, Hin6I, Acil and HpyCH4IV. The majority 
of these CpG-SNPs were located in Acil and Hpall restriction 
sites, with Hin6I and HpyCh4rV sites containing fewer 
polymorphisms (data not shown). Anotlier approach to test 
for DNA polymorphisms is the use of restriction endonuclease 
isoschizomers with different sensitivity to CpG methylation. 
However, this approach is cuirently only possible for Hpall/ 
Mspl as there are no isoschizomers for most other methylation 
sensitive restriction enzymes. 

The third approach to differentiate the DNA sequence 
effects from the genuine epigenetic differences consists of 
performing an identical microarray experiment on the 
same DNA sample that has been stripped of all methylated 
cytosines. Our protocol utilizes the Phi29 DNA polymerase 
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Figure 3. (A) Structure and GC-content of the chromosomal region on human chromosome 22ql 1 .2 thai spans the catechol-o-methyltransferase gene (COMT), the 
thioredoxin reductase 2 gene ITXNRD2) and the armadillo repeat gene deleted in VCFS (ARVCF). Vertical black bars represent exons. (B) To determine the 
methylation proflle of the 100 kb TXNRD2-C0MT-ARVCF re^oa, 384 oligonucleotides (SOmers, black horizonlal bars) were designed based on the restriction sites 
for the me(hylation-sensitivcendonucleases,HpaII,Hin6IandAciI (additional alteniativc enzymes are HpyCH4IV or Hinll). Depend! on the methylation status of 
the CpG-dinucleotides several combinations of amplicons (grey horizontal bars) can potentially hybridize to the oligonucleotides. (C) Typical hybridization patterns 
of the hyponiethylated fraction of human gDNA on the COMT oligonucleolide-microarray. As discussed in Results, the complexity and infomiativeness of the 
hybridization signals increases with increasing number of methylation-sensitive restriction enzymes. 



to amplify whole genomic DNA, which creates a copy of 
the genome with all methylated cytosines replaced by 

unmethylatcd cytosines. Amplified DNA samples are 
then subjected to the same steps as depicted in Figure 1 
and hybridized on the microarrays. In this experiment all 
of the outliers must be a result of DNA sequence variations 
within the restriction sites of the enzymes used. These data 
can then be plotted against the DNA methylation data, 
which are assayed in parallel (Figure 4). In six experiments 
that used amplified genomic DNA, the number of SNP- 
based outliers (threshold log-ratio <— 0.3, >0.3) ranged 
from 272 to 741 (432 ± 165, mean ± SD), or 2.2-6.1% 
of 12 192 CpG islands. Out of these, 72-234 (120 ± 66, 
mean ± SD) were initially identified as DNA methylation 
differences in microarray experiments using the unmethyl- 
ated fraction derived from the triple-digest with Hpall, 
Acil and Hin6I. From the CpG island array studies, our 
estimate is that 10-30% of the outliers detected in DNA 
methylation experiment could be due to DNA sequence 
variation. 



Reproducibility 

To test the reproducibility of the method, a genomic DNA 
sample was split and subjected to the procedure of enrichment 
of the unmethylatcd fraction. The resulting amplification pro- 
ducts were labelled with Cy5 and Cy3 and then co-hybridized 
on the COMT array, which contains probes that flank the 
Hpall, Hin6I and Acil restriction fragments around the 
COMT gene. The Cy3 and Cy5 hybridization intensities exhib- 
ited very similar values {R^ = 0.997; Figure 5A). Analogous 
experiments, including switch dye hybridizations, were 
repeated several times also with the CpG island arrays and 
in all cases were highly reproducible (R^ > 0.97). 




Figure 4. Combined methylation- and SNP-analysis on a CpG island micro- 
array. The data of two separate hybridizations of DNA samples derived from 

post mortem brain of two individuals are plotted against each other. The Y-axis 
contains the data derived from a methylation analysis (triple-cleavage with 
Hpall, Hin6l and Acil), whereas the X-axis contains the SNP data derived from 
the hybridization of the same DNA .samples, which were subjected to the entire 
genome amplification prior to cleavage by the methylation-sensitive restriction 
enzymes (Materials and Methods). Scale: log (Cy5/Cy3); an increased log- 
value on the y-axis is indicated by red versus a decreased log-value represented 
by green. Significant outliers (log-ratio <— 0.3, >0.3, 2-fold difference) can be 
classified into four clusters (S = SNPs, M = DNA methylation differences), 
enabling the differentiation of epigenetic differences and nucleotide 
polymorphisms between the test-samples. Amp = Whole-genome amplified 
.sample. 

Another critical factor in the amplification of unmethylatcd 
or hyperraethylated DNA fragments is to ensure that no 
sequence specific bias is introduced. The rate of amplification 
of repetitive sequences generally declines faster than that of 
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less abundant fragments in the later cycles of PCR (33). With 

increasing amplification cycles, repetitive DNA strands reach 
relatively high concentration and begin re-annealing to each 
other during the steps below the DNA melting temperature. To 
avoid this, a two-temperature PCR that uses a combined high- 
temperature elongation-annealing step was applied. A series 
of experiments were performed Investigating how the number 
of PCR cycles would affect the hybridization patterns. As can 
be seen in Figure 5B, the relative intensities of the hybridiza- 
tion signals of both single copy sequences and repetitive DNA 
fragments, were similar in the range of 20-30 amplification 
cycles (/?" = 0.991). Only when increasing the cycle numbers 
beyond 40 cycles was a biased amplification of some DNA 
sequences observed (data not shown). 



I 



# ■« oinglo copy ooquoiwcs 
B a-roHOliUyo (IMO coplos) 
O • hlat!ly>BpelW\»i>1.00 copies) 




E 
to 
-I' 
iU 

% 



40 bp : 
.'-"Sbp, 50bp 

■ ■ 



;164.lip 
fraghmil 



; 246 bp 
; fragmont 



Ciy&(i6i<i3E Unibda; «xGE pBI<32Z) 





2 \xg Control (CyS) 



Sensitivity 

To test if differentially represented DNA fragments in two 
different DNA samples can be detected by this method, 
prior to methylalion-sensitive cleavage, human gDNA was 
'spiked' with unmethylated heterologous DNA, A, phage 
and pBR322 plasmid O'igure 5C). Each sample was supple- 
mented with a different amount of spike-DNA, therefore 
mimicking differentially methylated sequences. The exact 
amount of X and pBR322 corresponded to increasing numbers 
of human genomic equivalents (1 GE of 'spike' DNA equals 
16.28 pg >J\ig gDNA and 1.45 pg/|Xg gDNA of pBR322, 
respectively). Hence, each of the experiments compared the 
intensities generated by 1 GE of X, plus 128 GE of pBR322 
(y-axis) versus 1 6 GE of X, plus 8 GE of pBR322 (X-axis). 
While the plotted signal intensities of the human gDNA 
sequences are positioned on or close to the regression line 
(indicating no methylation difference), the % and pBR322 
fragments were identified as outliers. The average signal 



Figure S. ReprotUicibilily and sensitivity oi" the method. (A) ACOMrmicroarray 
scatter ]i!ot representing two sets ot'amplification products derived from the same 
DNA source but produced at different time points by different researchers. The 
high-correlation coefficient of signal intensities demonstrates a high reproduci- 
bility of tlie method. (B) Influence of the PCR cycle number. Scatter plot 
diagrams show hybridization signal intensities of the unmethylated fraction that 
was amplified using 20 PCR cycles (Cy3 channel) and 30 cycles (CyS channel). 
Ampliiication products of each PCR were co-hybridized to the COWrmicroarray 
thai contained oligonucleotides representing single copy sequences (closed 
circles), partially repetitive .sequence!! (grey squares; 15-99 copies/genome) 
and highly repetitive DNA fragments (open squares; >100 copies/genome), such 
as ALU and LtNE repeals, (C) Scatter plot representing the unmethylated fraction 
of human gDNA 'spiked' with diffen;nt amounts of control DNA. The test samples 
were hybridized to the COM!' array and contained either a 16-fold excess of 
X DNA (16 genome equivalents [GE] versus 1 GE; 10 fragments) or a 16-fold 
excess of pBR322 ( 128 GE versus 8 GE; 2 fragments), respectively. The ampli- 
cons of the spiked DNA (representing unmethylated DNA) can be easily 
distinguished as outliers; whereas the signals representing gDNA are located close 
to the regression line. Median signal intensities of different length oligonucleo- 
tides (4(K'iO bases) that target a specific HpaTl restriction fragment in X DNA 
reveal that the length of spotted sequences directly influences the spot intensity and 
therefore the sensitivity of the microarray. (D) Sensitivity of the CpG-island 
microiuray hybridization. Control amplicon (2 jig) (post mortem brain, unmethy- 
lated fraction) was labelled with Cy5 and co-hybridized with 2 p.g (0% difference), 
1 .9 |Xg (5% difference), 1 .8 (ig ( 1 0%difference), 1 .5 ^Lg (25% difference) or 1 .0 |lg 
(50% difference) of Cy3-iabelled amplicon. For each hybridization to a COMT 
array, the regres!>ion lines represent the overall intensity that mimics methylation 
differences over the entire sample. The decrease of amount of DNA is reflected in 
the angle of the regression lines, which deviated by 5-7% from the expected 
values. 
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intensity ratio of % oligonucleotides was 15.4, which is very 
close to the ratio of spiked-DNA (16:1). The intensity values 
for pBR322 were not as linear and exhibited a 6.5- to 10-fold 
difference (expected the same ratio of 1 : 1 6), most likely due to 
saturation eifects of the hybridization. 

In order to determine the sensitivity of the hybridization per 
se, a contiol amplicon DNA was compared with itself but by 
decreasing the amounts of DNA by 5, 10, 25 and 50%. On the 
global level, the regression lines [y = f(.v)] reflected reprodu- 
cible differences of the amount of amplicon DNA used in the 
hybridization and varied by 5-7% from the expected values 
(Figure 5D). Individual sites exhibited a lower accuracy, 
which depended on the signal intensity, i.e. the stronger the 
signal, the closer the observed spot intensity was to the expec- 
ted one. The rate of false outliers (log-ratio <— 0.3; >0.3; 2-fold 
difference) was on average 3%. Usually, replication of 
microanay experiments reduced the degree of aberration 
(log-ratio <— 0.3; >0.3) below 2% for all types of microarrays. 

Examples of DNA methylation profiles 

Identification of DNA modification differences is provided in a 
series of examples below. The COMT oligonucleotide array 
was used to identify DNA methylation changes in a brain 
tumour (Figure 6A). In contrast to the pair of conUrol brain 
DNA samples, where hybridization signals are close to the 
regression line (indicating similar DNA methylation patterns), 
a visible proportion of the hybridization signals originating 
from the unmethylated DNA fraction of the brain tumour 
deviates from the regression line. More subtle changes in 
DNA methylation patterns have been identified when post 
mortem brain tissues of healthy individuals were compared 
with the same tissues from schizophrenia patients (A. Schu- 
macher, A. Petronis, manuscript in preparation; representative 
example is shown in Figure 6B). The differences of the cancer 
and psycho-sis studies show that diseases other than cancer 
may reveal more subtle epigenetic differences, and therefore, 
the infoiTTiativeness and sensitivity of the epigenetic profiling 
method is of critical importance. 

Another application of the technology includes epigenetic 
profiling of different tissues. One example of tissue specific 
effects is shown using the CpG island microarrays that contain 
12 192 CpG island clones of whom 8025 represent unique 
sequences. CpG islands tend to be found in many promoter 
sequences and their methylation has profound effects on gene 



silencing in mammalian genomes. The scatter plot shows two 
distinct spot areas, which represent predominantly unmethyl- 
ated fragments in placenta (yellow spots) and brain (orange 
spots), respectively (Figure 7A). About 1 1 % of the CpG island 
fragments exhibited 2-fold or more signal intensity difference 
between the two tissues. Some of the strongest brain-specific 
signals could be identified for CpG islands associated 
with neuronal genes such as DPYSL5, FABP7, DIRAS2, 
GRINM, SLC24A3 and DSCAMLl, whereas strong placenta- 
specific outliers were associated with genes expressed in 
placenta, such as PCMl, CCNDI, HA-1 and ADAMTSU. 
Overall, analysis revealed that brain DNA harboured notably 
more unmethylated CpG islands than placenta DNA. 

Verification of detected methylation differences 

Several loci that displayed methylation differences in our 
experiments were selected for verification by the sodium bisul- 
fite modification mapping of methylated cytosines (Materials 
and Methods). The technique is based on the reaction of gDNA 
with sodium bisulfite under conditions such that cytosine is 
deaniinated to uracil but 5-methylcytosine remains unaltered. 
In the sequencing of amplified products, all uracil and thymine 
residues are detected as thymine and only "''C residues remain 
as cytosine. The sites for the methylation-sensitive restriction 
enzymes used in our experiments showed the expected 
methylation difference across the DNA samples, as exempli- 
fied for CpG island clones located in the promoter region 
of galectin-1 and in the promoter region of a brain-specific 
transcript CR606704 (Figure 7B and C). 

Chromosome-wide mapping of DNA methylation 
differences 

Analysis of the unmethylated fraction from brain specific 
DNA of eight adults using a chromosome 21/22 tiling array 
detected 488-747 unmethylated sites per sample (Table 2). 
This number increased to 977 in a merged map, showing that 
many sites were common between different individuals. The 
vast majority of the sites (~90%) were positioned outside of 
the 5' ends and 5' flanking regions of tlie genes consistent with 
abundant transcriptional activity and a significant fraction of 
transcription factor binding sites found outside of known 
annotations (26,27,34). The unmethylated sites outside of 
the 5' ends of known genes were about equally distributed 




Figure 6. Applications of the epigenetic profiling technology. (A) Changes of methyhilion profiles at TXNRD2-C0MT-ARVCF in a brain tumour. The data from two 
different microarrays experiments are superimposed over each other. The analysis of two post monem brain samples (closed dots) reveals no major difference in 
methylation levels, whereas the signal intensities vary significantly in the brain tumour (grey dote) when compared with the normal brain. (B) The comparison of 
DNA methylation profiles using the COMT microanay in brain tissue of a healthy control and a schizophrenia patient displays subtle epigenetic differences. 
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Figure 7. Examples of applications using a CpG island iiiicroamy. (A ) Hybridization of the immethylated fraction of placenta DNA and post mortem brain DNA to a 
CpG island array. Two pools of CpG island elements could be identified, which display extensively different methylation levels between these tissues (Nole: some of 
the identified differences could be due to DNA sequence variation). (B) To validate the identified methylation differences, several CpG islands were subjected to 
bisulfite modification based mapping of methylated cylosines as exemplified for CpG island clones 22_B_1 2 (promoter region of Galectin- 1 ) and 52_C_03 (promoter 
region of a brain-specific transcript, CR606704). The top sequence shows the reverse strand (— ) of the original restriction sites, the bottom sequence display.s the 
bisulfite-modified DNA. For each bisulfite-modlfied CpG-lsland, 8-10 clones were sequenced per tissue. Sequence S2_C_03 revealed several fully methylated 
CpG's in placenta, which were unmethylated in brain. In contrast, clone 22_B_12 showed subtler methylation differences ( I S-1 00%), depending on the position of 
CpG-dinucleotide. (C) Methylation patterns of clones 22B_12 and S2_C_03 derived from bisulfite sequencing of 1 0-i 2 clones per tissue. The yellow boxes indicate 
CpG dinucleotides that are shown in the sequenced graph (Figure 7B). 



Table 2. Interindi vidual differences and distribution of the detected unmethylated sites with respect to the known genes as defined by the combined set of RefSeq and 
UCSC known genes for each brain DNA sample (M17-M2S) and the merged map 



Individual 


3'-nanking 


3'ter 


S'-fianking 


5'nanking-3'ftanking 


5'ter 


Distal 


Internal 


Total 


Site coverage (bp) 


#M17 chr21/22 


13/12 


2/16 


8/20 


2/4 


10/20 


64/122 


98/97 


488 


64943/134730 


%Total 


5.1 


3.7 


5.7 


1.2 


6.1 


38.1 


40.0 






#MJgchi2,l/22 


17/22 


9/15 


13/29 


3/3 


16/28 


95/191 


134/152 


727 


98456/236797 


%Total 


5.4 


3.3 


5.8 


0.8 


6.1 


39.3 


39.3 






#MI9clir21/22 


15/24 


11/14 


12/27 


2/5 


14/21 


86/173 


119/130 


653 


88290/221721 


%Total 


6.0 


3.8 


6.0 


I.I 


5.4 


39.7 


38.1 






#M21 cht21/22 


20/24 


12/18 


15/29 


2/5 


14/22 


102/184 


143/157 


747 


109595/252347 


%Total 


5.9 


4.0 


5.9 


0.9 


4.8 


38.3 


40.2 






#M22 chr2I/22 


18/20 


8/17 


9/29 


3/6 


15/24 


86/169 


127/143 


674 


87604^13453 


%Total 


5.6 


3.7 


5.6 


1.3 


5.8 


37.8 


40.1 






#M23 chr21/22 


12/15 


4/13 


10/25 


2/3 


10/21 


68/150 


101/111 


545 


70912/163322 


%Totel 


5.0 


3.1 


6.4 


0.9 


5.7 


40.0 


38.9 






#M24chr2I/22 


14/18 


5/12 


7/20 


4/3 


10/20 


61/158 


88/107 


527 


65639/187229 


%Tolal 


6.1 


3.2 


5.1 


1.3 


5.7 


41.6 


37.0 






#M25 clii-21/22 


17/15 


7/13 


10/18 


3/3 


9/22 


65/171 


102/97 


552 


69937/171073 


%Total 


5.8 


3.6 


5.1 


1.1 


5.6 


42.8 


36.1 






Merged chr2l/22 


26/28 


13/22 


19/36 


4/9 


19/34 


142/237 


187/201 


977 


152148/314374 


%Total 


5.5 


3.6 


5.6 


1.3 


5.4 


38.8 


39.7 







'5'ter' or '3'ter' refers to a 5' or 3' terminal site internal and within I kb of a gene boundary '5'flanking' or '3'flanking' refers to a site outside and within 5 Icb of a gene 
boundary; 'internal' refers to an intronic site and 'distal' refers to an intergenic site outside of the -5 kb/+l kb boundaries. A site can also be both 5' and 3' flanking in 
a gene rich region and referred as '5'flanking-3'nanking'. 



between sites residing within introns of known genes and 
outside of the gene boundaiies. Interestingly, while some 
genes, like BCR, showed a large number of sites inside 
the gene boundaries, some loci, like C210RF55 spanning 
~150 kb, were essentially devoid of internal unmethylated 
sites and in some cases, such as the SIM2 locus, the unmethyl- 
ated sites were limited to the first intron (Figure 8A-C). Such 
intragenic methylation may inhibit inappropriate transcrip- 
tional initiation at cryptic sites (35) or may serve as regulators 
of alternate transciipts as can be seen for SIM2. Overall, 



unmethylated sites detected in this study cover ~0.47 Mb 
or ~4% of the 12 Mb of non-repetitive sequences of chromo- 
somes 21 and 22 interrogated in the combined map of all eight 
individuals with an average of 0.28 Mb (2.3%) in any given 
individual. Maps of the methylation patterns (average value 
of the eight tested individuals) of the q-arms of chromosome 
21 and 22 are shown in Figure 9A-B. Detailed maps of all 
individuals for chromosome 21 and 22, linked to the UCSC 
Genome Browser (http://genome.ucsc.edu) are also available 
on our web-based methylation database (Web Recourses). 
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Figure 8. Profiles of unmetliylatedsites in three loci on liuman chromosomes 2 1 and 22(50 1 bp window. Materials and Methods): BCR (A), C2IORF55 (B) andS/Af2 
(C) for human brain DNA (average of eight individuals, M I7-M25). The graphs are based on /"-values for each individual interrogation that show the significance of 
the enrichment in the unmethylated fraction versus total gDNA. The /'-values were converted to the (-10 logm) scale, such that, for example, /"-value of 10"* 
becomes 40. The vertical axes are adjusted to represent probes in the 40-1 20 range (P-values of J 0"''-l 0" thus only probes that pass /" < 10"* threshold are 
shown. Enlarged is a part of the chr 22q 1 1 .2 1 region ( 1 8 1 bp window), .spanning breakpoints found in the generation of the two alternative fonns of the Philadelphia 
chromosome translocation. C = gDNA control. 



A comparison of the hypomethylation tracks with data from 
the Affymetrix transcriptome project (26,36) indicates that 
many of the unmethylated chromosomal regions overlap 

with mapped transcriptional active regions (Figure 9A-C, bot- 
tom tracks). These DNA methylation data complement exi.st- 
ing studies on transcriptional activity and histone 
modifications on human chromosomes 21 and 22 (37). We 
found that in the majority of cases, specific histone modifica- 
tion patterns reported by Bernstein et al. (37) for the human 
hepatoma cell line HepG2 overlapped notably with the 
observed DNA methylation patterns. An example is shown 
in Figure 9C for the PEX26 gene that is ubiquitously tran- 
scribed in most tissues. The gene harbours an extensively 
unmethylated CpG rich region in its promoter. The compar- 
ison of the different epigenetic profiles of both studies shows 
that the same genomic region was also highly acetylated at 
Lysine 9 and 14 of histone 3 (H3), accompanied with H3 
di- and trimethylation of Lysine 4. A comparison of histone 
modification tracks and our hypomethylation patterns for the 
q-arms of chromosome 21 and 22 revealed that H3 acetylation 
and Lys4 methylation usually correlated with unmethylated 
CpGs. 

DISCUSSION 

Microarray based technology for DNA modification analysis 
enables the highly parallel screening of numerous restriction 



fragments representing DNA methylation profiles over large 
segments of gDNA. Building on the principles described in 
earlier publications (1 J-23) our method addresses a series of 
critica] issues and exhibits several advantages. An earlier 
metliod (18) used a sucrose gradient to enrich the unmethyl- 
ated DNA fraction. This method, however, requires a large 
amount of DNA template and is rather imprecise in terms 
of the upper limit of the fragments that are subjected to 
hybridization. Other microarray methods for DNA methyla- 
tion analysis can be categorized into three main classes which 
are based on: (i) identification of bisulfite induced C-+T trans- 
itions (11-13,38,39), (ii) cleavage of gDNA by methylation- 
sensitive restriction enzymes and (iii) immunocapturing with 
antibodies against methylated cytosines. In the bisulfite aiTays, 
each tested CpG is represented by a pair of either C(G) or T(A) 
nucleotides. The arrays contain oligonucleotides that measure 
the C(G)/T(A) ratio in the bisulfite treated DNA (correspond- 
ing to ""^'C/C in the native DNA). Although informative and 
precise, these microarrays can contain only a limited number 
of oligonucleotides because treatment with bisulfite degener- 
ates the 4 nt code, resulting in a loss of specificity for a large 
portion of the genome. For example, after bisulfite treatment 
all of the possible 16 permutations of a four base sequence 
containing unmethylated C and T (CCCC, CTCT, CCCT, 
CCTT, TCTC, TTTC, TTTT and so on) will become identical 
TTTT. The bisulfite method is also laborious and cannot be 
easily applied to profile a large set of samples. Furthermore, it 
is difficult to design suitable oligonucleotides that would 



Nucleic Acids Research, 2006, Vol. 34, No. 2 539 



Chromosome 21q22:11-2lB2Z.3 




Chmmnflflm^r ^ ...... ^. i — t r i nr—rr- i ■ -i -| i 

CpSisiands II III : i' iii ill iDiiitiiiiti r tlhiiimVttilinitig.iw ligiiiri la 11 : iti ■ noi i n iimi ii iliiin nihit I 




Bp 

Dlmeth Lya4 
M2S 

fm 

M23 
M22 
M21 
Nf19 
M18 
Ml 7 

mRNA 
Cp6 Islands 

Affy Txn 



I II ■ ■ II ■ ■ I ■ ■ ■ (I HI ■ ■ ■ , | ..|| - .. - 



I 



I • 



Imnscripllon start 

,■,...„,....., 



..1M...Au ii..lat,i> 



Figure 9. Genomic views showing unmettiy iatcd regions on chromosomes 21 and 23. (A and B): The top tracks (dark red) in the two clitomosomal graphs shows the 
average amount of liypomethylation in the brain cortex of eight adult individuals. Also displayed are Icnown genes (dark: blue) and CpG islands (green). The bottom 
tracks display transcriptome data derived from 1 1 different tissues from the Affymelrix tcanscriptome phase 2 study (36). The track is coloured blue in areas that are 
thought to be transcribed at a statistically significant level. Regions that have a significant homology to otlier cliromosomal regions or that overlap putative 
pseudogenes are coloured in lighter shades of blue. All other legions of the track are colored brown. (C) Enlarged Is apart of chromosome 22qll.21, containing the 
peroxisome biogenesis factor 26 (JPEX26, MIM 608666) that shows correJation between histone modifications and unmethylated DNA in its promoter region. The top 
tliree traclcs represent histone modification data for H3 Lys4 dimelhylation (orange bar), H3 Lys4 trimethylalion (blue bar) and H3 Lys9/14 acetylation (yellow bar) 
(37). Underneath are the tracks for the average metfaylation patterns (unmethylated sites) observed in brain and the individual methylation patterns of all tested 
individuals (dark red). It is noteworthy thai methylation patterns exhibit some interindividual differences (indicated by arrows). 



exhibit similar melting temperatures since the specificity of 
base discrimination varies considerably (12). Using our 
approach, arrays can contain an abnost unlimited number of 
oligonucleotides: coverage can range from individual genes to 

entire chromosomes represented by millions of oligonuc- 
leotides on glass chips. Whole genome tiling arrays are already 
available for Arabidopsis thaliam and Escherichia coli, and 
will soon be available for the entire human genome. 
^''""Restriction enzyme based methods are used to enrich either 
\ tlie hyperraethylated or unmethylated fraction of gDNA. 



Methods relying on the enrichment and detection of hyper- 
methylated DNA have predominantly been used to identify 
abnormally methylated CpG islands in malignant celis (15- 
17,31). Although this strategy seems to be useful for detecting 
major epigenetic changes in some regions of the genome, the 
overall proportion of interrogated CpG sites is substantially 
lower compared with that achieved using approaches based 
\on the analysis of the umnethylated fraction. As shown in 
\lesults, we have estimated that inteixogation of the unmethyl- 
ated fraction of gDNA could be up to several hundred 
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folds more efficient than analysing the hypermethylated frac- 
tion. Furthermore, since unmethylated cytosines are less 
abundant in the genome than methylated cytosines (depending 
on the tissue, 70-90% of cytosines are methylated), analysis of 
the smaller unmethylated fraction of gDNA is more sensitive 
to detect subtle changes. For example, an increase of 10% 
from the normal density of """C would result in a 100% 
(from 20 to 10%) difference in the unmethylated fraction, 
but only a 12% (from 80 to 90%) difference in the hyper- 
mp ftylated fraction of gDN Aj)The unmethvlateafrntrtion has 
oeenTised in~some approacHes employing class 11 microarray 
methods, for instance by using tlie methylation-specific 
McrBC enzyme (23) to deplete the hypeiTnethylated fraction. 
However, the remaining unmethylated DNA fragments 
(>1 kb) have to be gel-purified, requiring large amounts of 
starting material. Additionally, the McrBC method may not be 
able to differentiate between dense and spai'se methylation 
within relatively short DNA fragments. For example, the 
2 kb human COMT promoter region, which contains 27 
McrBC target sites, can be cut to shorter than 1 kb fragments 
in cases where there are 2 (7%) or 27 (100%) methylated 
McrBC sites. Furthermore, the McrBC method caiuiot differ- 
entiate between unmethylated and polymorphic cytosines. 
Another method to enrich the unmethylated fraction uses 
the rare cutter NotI (5'-GCGGCCGC-3') (19-21). However, 
NotI sites are not well represented in the genome and will only 
provide a very superficial overview of genomic methylation 
patterns. An alternative to these methods is the use of anti- 
bodies specific for methylated cytosines [MeDIP (22)]. In this 
method, antibodies are used to immunocapture methylated 
genomic fragments. However, this approach requires large 
amounts of gDNA (>8 jj,g) and also relies on the enrichment 
of the less informative hypermethylated fraction of the 
genome. 

In our analyses, we have addressed another important 
issue: the interference of DNA polymorphisms that may 

simulate DNA modification differences across individuals. 
Data from the SNP consortium indicate that roughly every 
360th nucleotide in the human genome represents an SNP. in 
humans, ~2.16 million SNPs are detected in CpG dinuc- 
leotides, and such CpG SNPs ai-e 6.7-fold more abimdant 
than expected (40). Depending on the restriction enzyme 
combination, our CpG island airay-based studies demon- 
strated that 10-30% of all outliers initially detected as 
methylation differences contained SNPs (Figure 4). Informa- 
tion on the SNPs and other polymorphisms .such as deletions, 
inversions or duplications within the restriction sites of the 
enzymes used for the enrichment of the unmethylated or 
hypermethylated fractions is helpful in differentiating the 
epigenetic variations from the DNA sequence ones. To min- 
imize the effects of DNA polymorphisms, it may be also 
beneficial to compare affected tissue and healthy cells 
from the same individual. 

Another advantage of PCR-based methylation profiling 
methods is the ability to work with limited DNA resources. 
Although our basic protocol requires about 500 ng of gDNA, 
the amount of template DNA can be significantly lower. In 
our recent experiments, methylation patterns at the COMT 
region generated from a relatively small number of Jurkat 
tissue culture cells (up to 500 cells, or 3 ng of DNA) did 
not reveal any significant differences when compared with 



the methylation patterns generated from a substantially larger 
number of cells from the same tissue. 

There are, however, also some of limitations to the techno- 
logy described in this article. The methylation sensitive 
restriction enzymes do not interrogate every cytosine, and 
with our current design, more than half of CpG sites remain 
uninterrogated. This may be critical when the phenotypic out- 
comes are determined by a methylation change at an isolated 
cytosine that is not within the restriction site of a methylation 
sensitive restriction enzyme. This problem may be partially 
overcome by the application of the same arrays to the CpG 
specific immunoprecipitation technique (MeDIP) (22) in addi- 
tion to histone modification analysis through ChlP technology, 
which identifies DNA sequences associated with modified 
histones (10). DNA and histone modifications seem to be 
inter-dependent, and consequently the possibility of a com- 
bined approach that inteiTogates both DNA methylation and 
chromatin modification in parallel might be a productive 
approach to the fine mapping of epigenetic changes. Also, 
asymmetrical "C sites (CpNpN) that are found in plants 
and some fungi .such as Neurospora crassa are difficult to 
detect, although some methylation-sensitive type lis restric- 
tion enzymes are available (e.g. Esp3I or Bvel). However, 
methylation of asymmetrical sites in animal organisms is 
not common. Additionally, this array method can also be 
modified for analysis of methylated adenines in plants and 
bacteria. 

In summary, the use of microarrays targeted at unmethyl- 
ated cytosines is a high-throughput approach to profile DNA 
methylation patterns across the genome. The ability to analyse 
minute amounts of DNA may enable the epigenetic screening 
of DNA in plasma, serum or other body fluids, as well as in 
prenatal diagnostics. Although all the examples provided in 
this work investigated human DNA, the same strategies can be 
used for the epigenetic analyses of numerous other species. It 
is evident that epigenetic profiling should be performed in a 
systematic, unbiased fashion and not limited to the tradition- 
ally preferable regions such as CpG islands. Outside of CpG 
islands, numerous other genomic loci exist that may be sites 
for important epigenetic modification, including enhancers, 
imprinting control elements (41) or the regions that encode 
regulatory RNA elements. 

The above described technology, in combination with 
existing epigenetic profiling methods, may help to identify 
inter-individual variation in genome-wide methylation pat- 
terns as well as epigenetic changes that arise during tissue 
differentiation and the understanding of the epigenetic effects 
of various environmental factors. Of particular interest is the 
application of high-throughput DNA methylation analyses to 
address the molecular basis of various non-Mendelian irregu- 
larities of complex diseases, such as discordance of mono- 
zygotic twins, remissions and relapses of a disease, parent of 
origin- and sex-effects, and tissue- and site-specificity (42). 
Further technological developments may include building 
high-resolution oligonucleotide-based microarrays spannmg 
the entire human genome, improving the enrichment strate- 
gies through the application of more specialized methylation 
sensitive restriction enzymes, and substantial reduction in 
the amount of initial template DNA down to the amount 
of a haploid or diploid genome. All these developments 
will provide the basis for identifying the methylation profile 
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of the entire genome in a single cell, one of the 'quantum 
leaps' in post-genomic biology (43). 
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